mRMRe: an R package for parallelized mRMR ensemble feature selection

نویسندگان

  • Nicolas De Jay
  • Simon Papillon-Cavanagh
  • Catharina Olsen
  • Nehme Hachem
  • Gianluca Bontempi
  • Benjamin Haibe-Kains
چکیده

MOTIVATION Feature selection is one of the main challenges in analyzing high-throughput genomic data. Minimum redundancy maximum relevance (mRMR) is a particularly fast feature selection method for finding a set of both relevant and complementary features. Here we describe the mRMRe R package, in which the mRMR technique is extended by using an ensemble approach to better explore the feature space and build more robust predictors. To deal with the computational complexity of the ensemble approach, the main functions of the package are implemented and parallelized in C using the openMP Application Programming Interface. RESULTS Our ensemble mRMR implementations outperform the classical mRMR approach in terms of prediction accuracy. They identify genes more relevant to the biological context and may lead to richer biological interpretations. The parallelized functions included in the package show significant gains in terms of run-time speed when compared with previously released packages. AVAILABILITY The R package mRMRe is available on Comprehensive R Archive Network and is provided open source under the Artistic-2.0 License. The code used to generate all the results reported in this application note is available from Supplementary File 1. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

Audio Genre Classification with Semi-Supervised Feature Ensemble Learning

Widespread availability and use of music have made automated audio genre classification an important field of research. Thanks to feature extraction systems, not only music data, but also features for them have become readily available. However, handlabeling of a large amount of music data is time consuming. In this study, we introduce a semi-supervised random feature ensemble method for audio ...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

LIBGS: A MATLAB software package for gene selection

Many gene selection algorithms have been applied in gene expression data analysis successfully. To solve different developing environments of these toolkits, such as rankgene (Su et al., 2003), and mRMR (http://research.janelia.org/peng/proj/mrmr/index.htm), perform data analysis and make algorithm comparison more flexible, we have developed a software package LIBGS including: 1) Seven new gene...

متن کامل

bartMachine: Machine Learning with Bayesian Additive Regression Trees

We present a new package in R implementing Bayesian additive regression trees (BART). The package introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, incorporation of missing data and the ability to save trees for future prediction. It is significantly faster than the current R implementation, parallelized, and cap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 29 18  شماره 

صفحات  -

تاریخ انتشار 2013